Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Nucleic Acids Res ; 49(D1): D575-D588, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-32986834

RESUMO

For over 10 years, ModelSEED has been a primary resource for the construction of draft genome-scale metabolic models based on annotated microbial or plant genomes. Now being released, the biochemistry database serves as the foundation of biochemical data underlying ModelSEED and KBase. The biochemistry database embodies several properties that, taken together, distinguish it from other published biochemistry resources by: (i) including compartmentalization, transport reactions, charged molecules and proton balancing on reactions; (ii) being extensible by the user community, with all data stored in GitHub; and (iii) design as a biochemical 'Rosetta Stone' to facilitate comparison and integration of annotations from many different tools and databases. The database was constructed by combining chemical data from many resources, applying standard transformations, identifying redundancies and computing thermodynamic properties. The ModelSEED biochemistry is continually tested using flux balance analysis to ensure the biochemical network is modeling-ready and capable of simulating diverse phenotypes. Ontologies can be designed to aid in comparing and reconciling metabolic reconstructions that differ in how they represent various metabolic pathways. ModelSEED now includes 33,978 compounds and 36,645 reactions, available as a set of extensible files on GitHub, and available to search at https://modelseed.org/biochem and KBase.


Assuntos
Bactérias/metabolismo , Bases de Dados Factuais , Fungos/metabolismo , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Plantas/metabolismo , Bactérias/genética , Genoma Bacteriano , Termodinâmica
3.
Pac Symp Biocomput ; 24: 172-183, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864320

RESUMO

The rapid acceleration of microbial genome sequencing increases opportunities to understand bacterial gene function. Unfortunately, only a small proportion of genes have been studied. Recently, TnSeq has been proposed as a cost-effective, highly reliable approach to predict gene functions as a response to changes in a cell's fitness before-after genomic changes. However, major questions remain about how to best determine whether an observed quantitative change in fitness represents a meaningful change. To address the limitation, we develop a Gaussian mixture model framework for classifying gene function from TnSeq experiments. In order to implement the mixture model, we present the Expectation-Maximization algorithm and a hierarchical Bayesian model sampled using Stan's Hamiltonian Monte-Carlo sampler. We compare these implementations against the frequentist method used in current TnSeq literature. From simulations and real data produced by E.coli TnSeq experiments, we show that the Bayesian implementation of the Gaussian mixture framework provides the most consistent classification results.


Assuntos
Genoma Bacteriano , Modelos Genéticos , Algoritmos , Teorema de Bayes , Biologia Computacional/métodos , Simulação por Computador , Elementos de DNA Transponíveis , Escherichia coli/genética , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Estatísticos , Método de Monte Carlo , Mutagênese Insercional , Distribuição Normal
5.
Pac Symp Biocomput ; 22: 3-14, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27896957

RESUMO

With continued rapid growth in the number and quality of fully sequenced and accurately annotated bacterial genomes, we have unprecedented opportunities to understand metabolic diversity. We selected 101 diverse and representative completely sequenced bacteria and implemented a manual curation effort to identify 846 unique metabolic variants present in these bacteria. The presence or absence of these variants act as a metabolic signature for each of the bacteria, which can then be used to understand similarities and differences between and across bacterial groups. We propose a novel and robust method of summarizing metabolic diversity using metabolic signatures and use this method to generate a metabolic tree, clustering metabolically similar organisms. Resulting analysis of the metabolic tree confirms strong associations with well-established biological results along with direct insight into particular metabolic variants which are most predictive of metabolic diversity. The positive results of this manual curation effort and novel method development suggest that future work is needed to further expand the set of bacteria to which this approach is applied and use the resulting tree to test broad questions about metabolic diversity and complexity across the bacterial tree of life.


Assuntos
Bactérias/genética , Bactérias/metabolismo , Genoma Bacteriano , Bactérias/classificação , Biologia Computacional , Variação Genética , Redes e Vias Metabólicas/genética , Fenótipo , Filogenia
6.
Front Microbiol ; 7: 1819, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27933038

RESUMO

Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain.

7.
Front Microbiol ; 7: 1191, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27555837

RESUMO

Numerous methods for classifying gene activity states based on gene expression data have been proposed for use in downstream applications, such as incorporating transcriptomics data into metabolic models in order to improve resulting flux predictions. These methods often attempt to classify gene activity for each gene in each experimental condition as belonging to one of two states: active (the gene product is part of an active cellular mechanism) or inactive (the cellular mechanism is not active). These existing methods of classifying gene activity states suffer from multiple limitations, including enforcing unrealistic constraints on the overall proportions of active and inactive genes, failing to leverage a priori knowledge of gene co-regulation, failing to account for differences between genes, and failing to provide statistically meaningful confidence estimates. We propose a flexible Bayesian approach to classifying gene activity states based on a Gaussian mixture model. The model integrates genome-wide transcriptomics data from multiple conditions and information about gene co-regulation to provide activity state confidence estimates for each gene in each condition. We compare the performance of our novel method to existing methods on both simulated data and real data from 907 E. coli gene expression arrays, as well as a comparison with experimentally measured flux values in 29 conditions, demonstrating that our method provides more consistent and accurate results than existing methods across a variety of metrics.

8.
Methods Mol Biol ; 985: 17-45, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23417797

RESUMO

Over the past decade, genome-scale metabolic models have proven to be a crucial resource for predicting organism phenotypes from genotypes. These models provide a means of rapidly translating detailed knowledge of thousands of enzymatic processes into quantitative predictions of whole-cell behavior. Until recently, the pace of new metabolic model development was eclipsed by the pace at which new genomes were being sequenced. To address this problem, the RAST and the Model SEED framework were developed as a means of automatically producing annotations and draft genome-scale metabolic models. In this chapter, we describe the automated model reconstruction process in detail, starting from a new genome sequence and finishing on a functioning genome-scale metabolic model. We break down the model reconstruction process into eight steps: submitting a genome sequence to RAST, annotating the genome, curating the annotation, submitting the annotation to Model SEED, reconstructing the core model, generating the draft biomass reaction, auto-completing the model, and curating the model. Each of these eight steps is documented in detail.


Assuntos
Redes e Vias Metabólicas/genética , Modelos Genéticos , Anotação de Sequência Molecular/métodos , Interface Usuário-Computador , Bases de Dados Genéticas , Genoma Bacteriano , Sistemas On-Line , Análise de Sequência de DNA , Software
9.
BMC Bioinformatics ; 13: 193, 2012 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-22873695

RESUMO

BACKGROUND: Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. RESULTS: We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. CONCLUSIONS: Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.


Assuntos
Perfilação da Expressão Gênica/métodos , Genes Bacterianos , Arginina/biossíntese , Bactérias/genética , Bactérias/metabolismo , Interpretação Estatística de Dados , Genoma Bacteriano , Análise de Sequência com Séries de Oligonucleotídeos , Reprodutibilidade dos Testes , Transcriptoma
10.
Bioinformatics ; 28(6): 891-2, 2012 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-22210867

RESUMO

CytoSEED is a Cytoscape plugin for viewing, manipulating and analyzing metabolic models created using the Model SEED. The CytoSEED plugin enables users of the Model SEED to create informative visualizations of the reaction networks generated for their organisms of interest. These visualizations are useful for understanding organism-specific biochemistry and for highlighting the results of flux variability analysis experiments.


Assuntos
Redes e Vias Metabólicas , Modelos Biológicos , Células Procarióticas/metabolismo , Software , Mycoplasma/citologia , Mycoplasma/metabolismo
11.
J Bacteriol ; 193(13): 3228-40, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21531804

RESUMO

Transcriptional regulatory networks are fine-tuned systems that help microorganisms respond to changes in the environment and cell physiological state. We applied the comparative genomics approach implemented in the RegPredict Web server combined with SEED subsystem analysis and available information on known regulatory interactions for regulatory network reconstruction for the human pathogen Staphylococcus aureus and six related species from the family Staphylococcaceae. The resulting reference set of 46 transcription factor regulons contains more than 1,900 binding sites and 2,800 target genes involved in the central metabolism of carbohydrates, amino acids, and fatty acids; respiration; the stress response; metal homeostasis; drug and metal resistance; and virulence. The inferred regulatory network in S. aureus includes ∼320 regulatory interactions between 46 transcription factors and ∼550 candidate target genes comprising 20% of its genome. We predicted ∼170 novel interactions and 24 novel regulons for the control of the central metabolic pathways in S. aureus. The reconstructed regulons are largely variable in the Staphylococcaceae: only 20% of S. aureus regulatory interactions are conserved across all studied genomes. We used a large-scale gene expression data set for S. aureus to assess relationships between the inferred regulons and gene expression patterns. The predicted reference set of regulons is captured within the Staphylococcus collection in the RegPrecise database (http://regprecise.lbl.gov).


Assuntos
Biologia Computacional/métodos , Regulação Bacteriana da Expressão Gênica , Genômica/métodos , Staphylococcaceae/fisiologia , Transcrição Gênica , Humanos , Mapeamento de Interação de Proteínas , Regulon , Staphylococcaceae/genética , Staphylococcaceae/metabolismo
12.
Biochim Biophys Acta ; 1810(10): 967-77, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21421023

RESUMO

BACKGROUND: The development of next generation sequencing technology is rapidly changing the face of the genome annotation and analysis field. One of the primary uses for genome sequence data is to improve our understanding and prediction of phenotypes for microbes and microbial communities, but the technologies for predicting phenotypes must keep pace with the new sequences emerging. SCOPE OF REVIEW: This review presents an integrated view of the methods and technologies used in the inference of phenotypes for microbes and microbial communities based on genomic and metagenomic data. Given the breadth of this topic, we place special focus on the resources available within the SEED Project. We discuss the two steps involved in connecting genotype to phenotype: sequence annotation, and phenotype inference, and we highlight the challenges in each of these steps when dealing with both single genome and metagenome data. MAJOR CONCLUSIONS: This integrated view of the genotype-to-phenotype problem highlights the importance of a controlled ontology in the annotation of genomic data, as this benefits subsequent phenotype inference and metagenome annotation. We also note the importance of expanding the set of reference genomes to improve the annotation of all sequence data, and we highlight metagenome assembly as a potential new source for complete genomes. Finally, we find that phenotype inference, particularly from metabolic models, generates predictions that can be validated and reconciled to improve annotations. GENERAL SIGNIFICANCE: This review presents the first look at the challenges and opportunities associated with the inference of phenotype from genotype during the next generation sequencing revolution. This article is part of a Special Issue entitled: Systems Biology of Microorganisms.


Assuntos
Genótipo , Fenótipo , Análise de Sequência de DNA/métodos , Animais , Humanos , Metagenômica/métodos
13.
Nat Biotechnol ; 28(9): 977-82, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20802497

RESUMO

Genome-scale metabolic models have proven to be valuable for predicting organism phenotypes from genotypes. Yet efforts to develop new models are failing to keep pace with genome sequencing. To address this problem, we introduce the Model SEED, a web-based resource for high-throughput generation, optimization and analysis of genome-scale metabolic models. The Model SEED integrates existing methods and introduces techniques to automate nearly every step of this process, taking approximately 48 h to reconstruct a metabolic model from an assembled genome sequence. We apply this resource to generate 130 genome-scale metabolic models representing a taxonomically diverse set of bacteria. Twenty-two of the models were validated against available gene essentiality and Biolog data, with the average model accuracy determined to be 66% before optimization and 87% after optimization.


Assuntos
Bactérias/genética , Bactérias/metabolismo , Genoma Bacteriano/genética , Modelos Biológicos , Bactérias/classificação , Genes Bacterianos , Fenótipo
14.
BMC Bioinformatics ; 9: 469, 2008 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-18986519

RESUMO

BACKGROUND: Despite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fisher's exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, as is typically the case for prokaryotes. RESULTS: We extend five methods of gene set analysis from use on experiments with multiple replicates, for use on experiments with few replicates. We then use simulated and real data to compare these methods with each other and with the Fisher's exact test (FET) method. As a result of the simulation we find that a method named MAXMEAN-NR, maintains the nominal rate of false positive findings (type I error rate) while offering good statistical power and robustness to a variety of gene set distributions for set sizes of at least 10. Other methods (ABSSUM-NR or SUM-NR) are shown to be powerful for set sizes less than 10. Analysis of three sets of experimental data shows similar results. Furthermore, the MAXMEAN-NR method is shown to be able to detect biologically relevant sets as significant, when other methods (including FET) cannot. We also find that the popular GSEA-NR method performs poorly when compared to MAXMEAN-NR. CONCLUSION: MAXMEAN-NR is a method of gene set analysis for experiments with few replicates, as is common for prokaryotes. Results of simulation and real data analysis suggest that the MAXMEAN-NR method offers increased robustness and biological relevance of findings as compared to FET and other methods, while maintaining the nominal type I error rate.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos , Células Procarióticas/metabolismo , Estatística como Assunto/métodos , Simulação por Computador , Escherichia coli K12/genética , Perfilação da Expressão Gênica , Salmonella typhimurium/genética
15.
BMC Genomics ; 9: 75, 2008 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-18261238

RESUMO

BACKGROUND: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. DESCRIPTION: We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12-24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. CONCLUSION: By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Genes de RNAr/genética , Genoma Arqueal , Genoma Bacteriano , Fases de Leitura Aberta/genética , Filogenia , Proteínas/genética , RNA de Transferência/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Fatores de Tempo , Interface Usuário-Computador
16.
BMC Bioinformatics ; 8: 139, 2007 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-17462086

RESUMO

BACKGROUND: Current methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process. RESULTS: We have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organism's genome. This focuses manual efforts on that portion of an organism's metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for Staphylococcus aureus. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for S. aureus, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (Escherichia coli, Helicobacter pylori, and Lactococcus lactis). We have implemented our tools and database within the SEED, an open-source software environment for comparative genome annotation and analysis. CONCLUSION: Our method sets the stage for the automated generation of substantially complete metabolic networks for over 400 complete genome sequences currently in the SEED. With each genome that is processed using our tools, the database of common components grows to cover more of the diversity of metabolic pathways. This increases the likelihood that components of reaction networks for subsequently processed genomes can be retrieved from the database, rather than assembled and verified manually.


Assuntos
Genoma/genética , Redes e Vias Metabólicas/genética , Design de Software , Genoma Bacteriano/genética
17.
Conf Proc IEEE Eng Med Biol Soc ; 2004: 2984-6, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-17270905

RESUMO

The gene ontology (GO) contains three hierarchies that represent gene function based on categories of molecular function, biological process, and cellular component. However, additional knowledge about the mechanisms underlying gene function is buried in the GO term definitions and is not computationally accessible. We describe a process for adding new links to the GO between molecular function terms and the biological process terms that those molecular functions are involved in. These new links enable inference engines to reason about relationships between genes that have disparate molecular functions but participate in the same biological process. In particular, we demonstrate how these new links enable more effective automated analysis of gene expression data.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...